AITopics

2510.01153

Country: North America > United States > California (0.27)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceAug-5-2025

Neural Policy Iteration for Stochastic Optimal Control: A Physics-Informed Approach

Kim, Yeongjong, Kim, Yeoneung, Kim, Minseok, Cho, Namkyeong

We propose a physics-informed neural network policy iteration (PINN-PI) framework for solving stochastic optimal control problems governed by second-order Hamilton--Jacobi--Bellman (HJB) equations. At each iteration, a neural network is trained to approximate the value function by minimizing the residual of a linear PDE induced by a fixed policy. This linear structure enables systematic $L^2$ error control at each policy evaluation step, and allows us to derive explicit Lipschitz-type bounds that quantify how value gradient errors propagate to the policy updates. This interpretability provides a theoretical basis for evaluating policy quality during training. Our method extends recent deterministic PINN-based approaches to stochastic settings, inheriting the global exponential convergence guarantees of classical policy iteration under mild conditions. We demonstrate the effectiveness of our method on several benchmark problems, including stochastic cartpole, pendulum problems and high-dimensional linear quadratic regulation (LQR) problems in up to 10D.

artificial intelligence, iteration, machine learning, (17 more...)

2508.01718

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

A Mean-Field Theory of $Θ$-Expectations

arXiv.org Artificial IntelligenceJul-31-2025

The canonical theory of sublinear expectations, a foundation of stochastic calculus under ambiguity, is insensitive to the non-convex geometry of primitive uncertainty models. This paper develops a new stochastic calculus for a structured class of such non-convex models. We introduce a class of fully coupled Mean-Field Forward-Backward Stochastic Differential Equations where the BSDE driver is defined by a pointwise maximization over a law-dependent, non-convex set. Mathematical tractability is achieved via a uniform strong concavity assumption on the driver with respect to the control variable, which ensures the optimization admits a unique and stable solution. A central contribution is to establish the Lipschitz stability of this optimizer from primitive geometric and regularity conditions, which underpins the entire well-posedness theory. We prove local and global well-posedness theorems for the FBSDE system. The resulting valuation functional, the $Θ$-Expectation, is shown to be dynamically consistent and, most critically, to violate the axiom of sub-additivity. This, along with its failure to be translation invariant, demonstrates its fundamental departure from the convex paradigm. This work provides a rigorous foundation for stochastic calculus under a class of non-convex, endogenous ambiguity.

artificial intelligence, assumption, equation, (17 more...)

2507.22577

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

A Theory of $θ$-Expectations

arXiv.org Machine LearningJul-29-2025

The canonical theory of stochastic calculus under ambiguity, founded on sub-additivity, is insensitive to non-convex uncertainty structures, leading to an identifiability impasse. This paper develops a mathematical framework for an identifiable calculus sensitive to non-convex geometry. We introduce the $θ$-BSDE, a class of backward stochastic differential equations where the driver is determined by a pointwise maximization over a primitive, possibly non-convex, uncertainty set. The system's tractability is predicated not on convexity, but on a global analytic hypothesis: the existence of a unique and globally Lipschitz maximizer map for the driver function. Under this hypothesis, which carves out a tractable class of models, we establish well-posedness via a fixed-point argument. For a distinct, geometrically regular class of models, we prove a result of independent interest: under non-degeneracy conditions from Malliavin calculus, the maximizer is unique along any solution path, ensuring the model's internal consistency. We clarify the fundamental logical gap between this pathwise property and the global regularity required by our existence proof. The resulting valuation operator defines a dynamically consistent expectation, and we establish its connection to fully nonlinear PDEs via a Feynman-Kac formula.

artificial intelligence, hypothesis 3, uniqueness, (14 more...)

arXiv.org Machine Learning

2507.20353

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Yang, Hee Jun, Gim, Minjung, Kim, Yeoneung

Solving nonconvex Hamilton--Jacobi--Isaacs equations with PINN-based policy iteration

arXiv.org Artificial IntelligenceJul-24-2025

We propose a mesh-free policy iteration framework that combines classical dynamic programming with physics-informed neural networks (PINNs) to solve high-dimensional, nonconvex Hamilton--Jacobi--Isaacs (HJI) equations arising in stochastic differential games and robust control. The method alternates between solving linear second-order PDEs under fixed feedback policies and updating the controls via pointwise minimax optimization using automatic differentiation. Under standard Lipschitz and uniform ellipticity assumptions, we prove that the value function iterates converge locally uniformly to the unique viscosity solution of the HJI equation. The analysis establishes equi-Lipschitz regularity of the iterates, enabling provable stability and convergence without requiring convexity of the Hamiltonian. Numerical experiments demonstrate the accuracy and scalability of the method. In a two-dimensional stochastic path-planning game with a moving obstacle, our method matches finite-difference benchmarks with relative $L^2$-errors below %10^{-2}%. In five- and ten-dimensional publisher-subscriber differential games with anisotropic noise, the proposed approach consistently outperforms direct PINN solvers, yielding smoother value functions and lower residuals. Our results suggest that integrating PINNs with policy iteration is a practical and theoretically grounded method for solving high-dimensional, nonconvex HJI equations, with potential applications in robotics, finance, and multi-agent reinforcement learning.

artificial intelligence, deep learning, machine learning, (18 more...)

2507.15455

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Robust Control with Gradient Uncertainty

arXiv.org Artificial IntelligenceJul-22-2025

We introduce a novel extension to robust control theory that explicitly addresses uncertainty in the value function's gradient, a form of uncertainty endemic to applications like reinforcement learning where value functions are approximated. We formulate a zero-sum dynamic game where an adversary perturbs both system dynamics and the value function gradient, leading to a new, highly nonlinear partial differential equation: the Hamilton-Jacobi-Bellman-Isaacs Equation with Gradient Uncertainty (GU-HJBI). We establish its well-posedness by proving a comparison principle for its viscosity solutions under a uniform ellipticity condition. Our analysis of the linear-quadratic (LQ) case yields a key insight: we prove that the classical quadratic value function assumption fails for any non-zero gradient uncertainty, fundamentally altering the problem structure. A formal perturbation analysis characterizes the non-polynomial correction to the value function and the resulting nonlinearity of the optimal control law, which we validate with numerical studies. Finally, we bridge theory to practice by proposing a novel Gradient-Uncertainty-Robust Actor-Critic (GURAC) algorithm, accompanied by an empirical study demonstrating its effectiveness in stabilizing training. This work provides a new direction for robust control, holding significant implications for fields where function approximation is common, including reinforcement learning and computational finance.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

2507.15082

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (0.46)
Information Technology (0.34)
Banking & Finance (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Hamiltonian Operator

arXiv.org Artificial IntelligenceJul-3-2025

Stochastic control problems in high dimensions are notoriously difficult to solve due to the curse of dimensionality. An alternative to traditional dynamic programming is Pontryagin's Maximum Principle (PMP), which recasts the problem as a system of Forward-Backward Stochastic Differential Equations (FBSDEs). In this paper, we introduce a formal framework for solving such problems with deep learning by defining a \textbf{Neural Hamiltonian Operator (NHO)}. This operator parameterizes the coupled FBSDE dynamics via neural networks that represent the feedback control and an ansatz for the value function's spatial gradient. We show how the optimal NHO can be found by training the underlying networks to enforce the consistency conditions dictated by the PMP. By adopting this operator-theoretic view, we situate the deep FBSDE method within the rigorous language of statistical inference, framing it as a problem of learning an unknown operator from simulated data. This perspective allows us to prove the universal approximation capabilities of NHOs under general martingale drivers and provides a clear lens for analyzing the significant optimization challenges inherent to this class of models.

artificial intelligence, machine learning, optimization problem, (18 more...)

2507.01313

Country:

Asia > China > Beijing > Beijing (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)

Li, Boyang, Gong, Zheng, Herbert, Sylvia

Solving Reach- and Stabilize-Avoid Problems Using Discounted Reachability

arXiv.org Artificial IntelligenceMay-15-2025

--In this article, we consider the infinite-horizon reach-avoid (RA) and stabilize-avoid (SA) zero-sum game problems for general nonlinear continuous-time systems, where the goal is to find the set of states that can be controlled to reach or stabilize to a target set, without violating constraints even under the worst-case disturbance. Based on the Hamilton-Jacobi reachability method, we address the RA problem by designing a new Lipschitz continuous RA value function, whose zero sublevel set exactly characterizes the RA set. We establish that the associated Bellman backup operator is contractive and that the RA value function is the unique viscosity solution of a Hamilton-Jacobi variational inequality. Finally, we develop a two-step framework for the SA problem by integrating our RA strategies with a recently proposed Robust Control Lyapunov-V alue Function, thereby ensuring both target reachability and long-term stability. We numerically verify our RA and SA frameworks on a 3D Dubins car system to demonstrate the efficacy of the proposed approach.

artificial intelligence, machine learning, value function, (16 more...)

2505.09067

Country:

North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > North Carolina (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceApr-15-2025

Reasoning without Regret

Chitra, Tarun

Chain-of-thought reasoning enables large language models to solve multi-step tasks by framing problem solving as sequential decision problems. Outcome-based rewards, which provide feedback only on final answers, show impressive success, but face challenges with credit assignment and slow convergence. In contrast, procedure-based rewards offer efficient step-level feedback, but typically require costly human supervision. We introduce \emph{Backwards Adaptive Reward Shaping} (BARS), a no-regret framework that converts sparse outcomes-based rewards into effective procedure-based signals. BARS uses sparse rewards generated from terminal-state priors and cover trees to scale rewards while preventing exploitation. With Bellman contraction and $(Δ, ε)$-gap rewards, our backward Euler solver achieves $ε$-accuracy in $O\left((R_{\max}/Δ)\log(1/ε)\right)$ iterations with $O(\log T)$ dynamic regret over $T$ rounds. Our analysis, based on generic chaining, continuous scaling limits, and non-linear Feynman-Kac bounds, connects recent outcome-based methods' empirical successes with the benefits of intermediate supervision. Combined, this provides the first rigorous no-regret algorithm for outcome reward shaping, providing a theoretical foundation for the empirical success of DeepSeek's R1.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

2504.09777

Genre:

Research Report (0.63)
Workflow (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Solanki, Prashant, Vertovec, Nikolaus, Schnitzer, Yannik, Van Beers, Jasper, de Visser, Coen, Abate, Alessandro

Certified Approximate Reachability (CARe): Formal Error Bounds on Deep Learning of Reachable Sets

arXiv.org Artificial IntelligenceMar-31-2025

-- Recent approaches to leveraging deep learning for computing reachable sets of continuous-time dynamical systems have gained popularity over traditional level-set methods, as they overcome the curse of dimensionality. However, as with level-set methods, considerable care needs to be taken in limiting approximation errors, particularly since no guarantees are provided during training on the accuracy of the learned reachable set. T o address this limitation, we introduce an ϵ -approximate Hamilton-Jacobi Partial Differential Equation (HJ-PDE), which establishes a relationship between training loss and accuracy of the true reachable set. T o formally certify this approximation, we leverage Satisfiability Modulo Theories (SMT) solvers to bound the residual error of the HJ-based loss function across the domain of interest. Leveraging Counter Example Guided Inductive Synthesis (CEGIS), we close the loop around learning and verification, by fine-tuning the neural network on counterexamples found by the SMT solver, thus improving the accuracy of the learned reachable set. T o the best of our knowledge, Certified Approximate Reachability (CARe) is the first approach to provide soundness guarantees on learned reachable sets of continuous dynamical systems.

artificial intelligence, deep learning, machine learning, (18 more...)

2503.23912

Country:

North America > United States > Indiana (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Netherlands > South Holland > Delft (0.04)

Genre: Research Report (0.50)

Industry: Transportation (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)